Create a data frame from the csv file.

read.table - Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file.

List of some arguments

read.table*(file, header = FALSE, sep = ",", dec = ".", row.names, col.names, na.strings = "NA", nrows = -1, skip = 0, comment.char** = "#", fileEncoding = "", encoding = "unknown")

file: the name of the file which the data are to be read from.
      Each row of the table appears as one line of the file.  If it
      does not contain an _absolute_ path, the file name is
      _relative_ to the current working directory, ‘getwd()’.
      Tilde-expansion is performed where supported.  This can be a
      compressed file (see ‘file’).

header: a logical value indicating whether the file contains the
      names of the variables as its first line.  If missing, the
      value is determined from the file format: ‘header’ is set to
      ‘TRUE’ if and only if the first row contains one fewer field
      than the number of columns.

sep: the field separator character.  Values on each line of the
      file are separated by this character.  If ‘sep = ""’ (the
      default for ‘read.table’) the separator is ‘white space’,
      that is one or more spaces, tabs, newlines or carriage
      returns.

quote: the set of quoting characters. To disable quoting altogether,
      use ‘quote = ""’.  See ‘scan’ for the behaviour on quotes
      embedded in quotes.  Quoting is only considered for columns
      read as character, which is all of them unless ‘colClasses’
      is specified.

dec: the character used in the file for decimal points.

numerals: string indicating how to convert numbers whose conversion to
      double precision would lose accuracy, see ‘type.convert’.
      Can be abbreviated.  (Applies also to complex-number inputs.)

row.names: a vector of row names.  This can be a vector giving the
      actual row names, or a single number giving the column of the
      table which contains the row names, or character string
      giving the name of the table column containing the row names.

      If there is a header and the first row contains one fewer
      field than the number of columns, the first column in the
      input is used for the row names.  Otherwise if ‘row.names’ is
      missing, the rows are numbered.

      Using ‘row.names = NULL’ forces row numbering. Missing or
      ‘NULL’ ‘row.names’ generate row names that are considered to
      be ‘automatic’ (and not preserved by ‘as.matrix’).

col.names: a vector of optional names for the variables.  The default
      is to use ‘"V"’ followed by the column number.


nrows: integer: the maximum number of rows to read in.  Negative and
      other invalid values are ignored.

skip: integer: the number of lines of the data file to skip before
      beginning to read data.

comment.char: character: a character vector of length one containing a
      single character or an empty string.  Use ‘""’ to turn off
      the interpretation of comments altogether.

stringsAsFactors: logical: should character vectors be converted to
      factors?  Note that this is overridden by ‘as.is’ and
      ‘colClasses’, both of which allow finer control.

fileEncoding: character string: if non-empty declares the encoding used
      on a file (not a connection) so the character data can be
      re-encoded.  See the ‘Encoding’ section of the help for
      ‘file’, the ‘R Data Import/Export Manual’ and ‘Note’.

encoding: encoding to be assumed for input strings.  It is used to mark
      character strings as known to be in Latin-1 or UTF-8 (see
      ‘Encoding’): it is not used to re-encode the input, but
      allows R to handle encoded strings in their native encoding
      (if one of those two).  See ‘Value’ and ‘Note’.

text: character string: if ‘file’ is not supplied and this is, then
      data are read from the value of ‘text’ via a text connection.
      Notice that a literal string can be used to include (small)
      data sets within R code.

skipNul: logical: should nuls be skipped?


read.table*(file, header = FALSE, sep = ",", dec = ".", row.names, col.names, na.strings = "NA", nrows = -1, skip = 0, comment.char** = "#", fileEncoding = "", encoding = "unknown")

In [ ]:
data<-read.table("data/root_length.csv",sep=";",dec=",",header=TRUE)

General analysis of the data

Head function

Returns the first of a vector, matrix, table, data frame or function.

Tail

Returns the last parts of a vector, matrix, table , data frame or function

In [ ]:
head(data)
tail(data)

Summary function

summary is a generic function used to produce result summaries.

In [ ]:
summary(data)

Table function

table uses the cross-classifying factors to build a contingency table of the counts at each combination of factor levels.

Let's show it for the number of lateral roots

In [ ]:
table(data$Lat_roots)

Hist function

The generic function hist computes a histogram of the given data values.

You can define the number of breaks

breaks: one of:

        • a vector giving the breakpoints between histogram cells,

        • a function to compute the vector of breakpoints,

        • a single number giving the number of cells for the
          histogram,

        • a character string naming an algorithm to compute the
          number of cells (see ‘Details’),

        • a function to compute the number of cells.

      In the last three cases the number is a suggestion only; as
      the breakpoints will be set to ‘pretty’ values, the number is
      limited to ‘1e6’ (with a warning if it was larger).  If
      ‘breaks’ is a function, the ‘x’ vector is supplied to it as
      the only argument (and the number of breaks is only limited
      by the amount of available memory).

We will define explictly to have a break for each value in the contingency table.

In [ ]:
num_of_breaks=length(table(data$Lat_roots))

hist(data$Lat_roots,breaks = num_of_breaks)

Simple pie plot

We create a new variable called data.group and "cut" into the levels given in the vector

cut(**x**, **breaks**, labels = NULL,
     include.lowest = FALSE, right = TRUE, dig.lab = 3,
     ordered_result = FALSE, ...)

   x: a numeric vector which is to be converted to a factor by
      cutting.

   breaks: either a numeric vector of two or more unique cut points or a
      single number (greater than or equal to 2) giving the number
      of intervals into which ‘x’ is to be cut.

   labels: labels for the levels of the resulting category.  By default,
      labels are constructed using ‘"(a,b]"’ interval notation.  If
      ‘labels = FALSE’, simple integer codes are returned instead
      of a factor.
In [ ]:
data.group <- cut( 
	data$Lat_roots, 
	c(0,5,10,100)) 

data.contingency_table <- table(data.group)

pie(data.contingency_table)

Manipulating the imported data frame

Create a column called lateralization_factor based on the "cuts" of lateral_roots

In [ ]:
data$lateralization_factor <- factor( 
	cut( data$Lat_roots, c(0,5,10,20) ), 
	labels=c("Low","Medium","High")
);

Showing the result

In [ ]:
data$lateralization
head(data)

Calculating the mean of length per row

In [ ]:
data$length_mean <- rowMeans(data[,2:4])
In [ ]:
head(data)

Adding a column with the standard deviation

sd(x) calculates the standard deviation of the given vector x

Since we had a matrix of data and we want to apply sd() to each row we mas use apply()

Function apply()

 Returns a vector or array or list of values obtained by applying a
 function to margins of an array or matrix.

Usage:

 apply(X, MARGIN, FUN, ...)

Arguments:

   X: an array, including a matrix.

MARGIN: a vector giving the subscripts which the function will be
      applied over.  E.g., for a matrix ‘1’ indicates rows, ‘2’
      indicates columns, ‘c(1, 2)’ indicates rows and columns.
      Where ‘X’ has named dimnames, it can be a character vector
      selecting dimension names.

 FUN: the function to be applied: see ‘Details’.  In the case of
      functions like ‘+’, ‘%*%’, etc., the function name must be
      backquoted or quoted.

 ...: optional arguments to ‘FUN’.

So we use:

x: the matrix data[,3:5]

Margin: Since we want to apply to each row '1'

FUN: The function we want to apply is 'sd'
In [ ]:
data$sd<-apply(data[,2:4],1, sd)
In [ ]:
head(data)

Saving the data frame to a file that can be imported in excel

write.csv prints its required argument ‘x’ (after converting it to a data frame if it is not one nor a matrix) to a file or connection.

write.csv() is a shortcut to write.table() with dec,sep hardcoded

write.table(x, file = "", append = FALSE, quote = TRUE, sep = ",", eol = "\n", na = "NA", dec = ".", row.names = TRUE, col.names = TRUE, fileEncoding = "")

x: the object to be written, preferably a matrix or data frame.
      If not, it is attempted to coerce ‘x’ to a data frame.

file: either a character string naming a file or a connection open
      for writing.  ‘""’ indicates output to the console.

append: logical. Only relevant if ‘file’ is a character string.  If
      ‘TRUE’, the output is appended to the file.  If ‘FALSE’, any
      existing file of the name is destroyed.

quote: a logical value (‘TRUE’ or ‘FALSE’) or a numeric vector.  If
      ‘TRUE’, any character or factor columns will be surrounded by
      double quotes.  If a numeric vector, its elements are taken
      as the indices of columns to quote.  In both cases, row and
      column names are quoted if they are written.  If ‘FALSE’,
      nothing is quoted.

 sep: the field separator string.  Values within each row of ‘x’
      are separated by this string.

 eol: the character(s) to print at the end of each line (row).  For
      example, ‘eol = "\r\n"’ will produce Windows' line endings on
      a Unix-alike OS, and ‘eol = "\r"’ will produce files as
      expected by Excel:mac 2004.

  na: the string to use for missing values in the data.

 dec: the string to use for decimal points in numeric or complex
      columns: must be a single character.

row.names: either a logical value indicating whether the row names of
      ‘x’ are to be written along with ‘x’, or a character vector
      of row names to be written.

col.names: either a logical value indicating whether the column names
      of ‘x’ are to be written along with ‘x’, or a character
      vector of column names to be written.  See the section on
      ‘CSV files’ for the meaning of ‘col.names = NA’.

fileEncoding: character string: if non-empty declares the encoding to
      be used on a file (not a connection) so the character data
      can be re-encoded as they are written.  See ‘file’.
In [ ]:
#write.csv(data,file="data/new_root_length.csv")